5 Primary hypotheses
5.1 Estimated probability of superiority
After seeing information about the medication along with a figure, we asked people to estimate the probability of superiority for the treatment over the control condition. Specifically, we asked “What is your best estimate of the probability that a randomly selected patient in the treatment group recovered more quickly than a randomly selected patient in the control group? (A 50% probability would indicate no difference in outcomes between the treatment and control groups.)”
5.1.1 Between-subjects effect of first visualization seen on first probability of superiority estimate
This looks at just the between-subjects effect that seeing SEs vs. SDs had by comparing people’s responses to only the first visualization condition they saw, ignoring the second estimates. For both medication types, people estimated the probability of superiority to be higher, on average, when they saw visualizations with SEs than with SDs.
5.1.1.1 Plots
This is plotted in two ways below, first means + one standard error, and then full distributions of responses. In the second type of plot the means and standard errors are indicated by points and horizontal error bars.
It’s interesting—and a bit puzzling—to see how the average estimates compare to the actual probabilities of superiorities, indicated by the dashed lines in each panel.
The distributions show that SEs result in many more “close to 100%” type responses.
turkers_estimated_psup <- plot_beeswarm_two_conditions_with_truth(df.turkers, true_effects, "first_condition", "first_superiority_estimate") +
facet_wrap(~ medication_type) +
labs(y="Estimated probability of superiority") +
xlab("First visualization seen")
turkers_estimated_psup
ggsave("figures/turkers_estimated_psup.pdf", width=6, height=4)
summary_stats_providers <- df.turkers %>% group_by(medication_type, first_condition) %>%
summarize(mu=mean(first_superiority_estimate),
se=sd(first_superiority_estimate)/sqrt(n()))
## `summarise()` has grouped output by 'medication_type'. You can override using the `.groups` argument.
blood_se_first_mu <- (summary_stats_providers %>% filter(medication_type == "Blood pressure scenario" & first_condition == "Saw SEs first"))$mu
blood_sd_first_mu <- (summary_stats_providers %>% filter(medication_type == "Blood pressure scenario" & first_condition == "Saw SDs first"))$mu
covid_se_first_mu <- (summary_stats_providers %>% filter(medication_type == "COVID-19 scenario" & first_condition == "Saw SEs first"))$mu
covid_sd_first_mu <- (summary_stats_providers %>% filter(medication_type == "COVID-19 scenario" & first_condition == "Saw SDs first"))$mu
blood_psup_test <- (df.turkers %>%
filter(medication_type == "Blood pressure scenario") %>%
t.test(first_superiority_estimate ~ first_condition, data = .) %>%
apa_print())$statistic %>%
remove_dollar_signs()
covid_psup_test <- (df.turkers %>%
filter(medication_type == "COVID-19 scenario") %>%
t.test(first_superiority_estimate ~ first_condition, data = .) %>%
apa_print())$statistic %>%
remove_dollar_signs()
# Extreme values
blood_extreme_se_perc <- round(100*(df.turkers %>%
filter(medication_type == "Blood pressure scenario" & first_condition == "Saw SEs first") %>%
mutate(extreme = first_superiority_estimate > 90) %>%
summarize(prop_extreme = mean(extreme)))$prop_extreme)
blood_extreme_sd_perc <- round(100*(df.turkers %>%
filter(medication_type == "Blood pressure scenario" & first_condition == "Saw SDs first") %>%
mutate(extreme = first_superiority_estimate > 90) %>%
summarize(prop_extreme = mean(extreme)))$prop_extreme)
covid_extreme_se_perc <- round(100*(df.turkers %>%
filter(medication_type == "COVID-19 scenario" & first_condition == "Saw SEs first") %>%
mutate(extreme = first_superiority_estimate > 90) %>%
summarize(prop_extreme = mean(extreme)))$prop_extreme)
covid_extreme_sd_perc <- round(100*(df.turkers %>%
filter(medication_type == "COVID-19 scenario" & first_condition == "Saw SDs first") %>%
mutate(extreme = first_superiority_estimate > 90) %>%
summarize(prop_extreme = mean(extreme)))$prop_extreme)
blood_extreme_test <- (df.turkers %>%
filter(medication_type == "Blood pressure scenario") %>%
mutate(extreme = first_superiority_estimate > 90) %>%
t.test(extreme ~ first_condition, data=.) %>%
apa_print())$statistic %>%
remove_dollar_signs()
covid_extreme_test <- (df.turkers %>%
filter(medication_type == "COVID-19 scenario") %>%
mutate(extreme = first_superiority_estimate > 90) %>%
t.test(extreme ~ first_condition, data=.) %>%
apa_print())$statistic %>%
remove_dollar_signs()
c(blood_psup_test, covid_psup_test, blood_extreme_test, covid_extreme_test)
## [1] "t(244.81) = 4.86, p < .001" "t(229.87) = 4.54, p < .001"
## [3] "t(202.15) = 4.02, p < .001" "t(145.50) = 4.77, p < .001"
5.1.1.2 Significance tests
Our primary, confirmatory hypothesis is that participants will estimate a greater probability of superiority for the treatment when the results of a hypothetical RCT communicate inferential uncertainty (i.e., by displaying standard errors around a point estimate) than when these results communicate outcome uncertainty (i.e., by displaying standard deviations).
The first set of tests of this hypothesis is a between-subjects test using two separate, independent-groups t-tests. Specifically, within each of the two medical scenarios, we compare probability of superiority estimates between participants who see the inferential uncertainty condition first to participants who see the outcome uncertainty condition first.
df.turkers %>%
group_by(medication_type) %>%
do(tidy(t.test(first_superiority_estimate ~ first_condition,
data = .,
conf.level = 0.95,
alternative = c("greater")))) %>%
kable()
| medication_type | estimate | estimate1 | estimate2 | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
|---|---|---|---|---|---|---|---|---|---|---|
| Blood pressure scenario | 8.416188 | 75.72388 | 67.30769 | 4.859116 | 1.1e-06 | 244.8105 | 5.556413 | Inf | Welch Two Sample t-test | greater |
| COVID-19 scenario | 7.446767 | 76.43089 | 68.98413 | 4.543675 | 4.5e-06 | 229.8704 | 4.740058 | Inf | Welch Two Sample t-test | greater |
We see a statistically significant difference: in both medical scenarios, those who saw SEs first gave higher estimates, on average, than those who saw SDs first. This supports our primary hypothesis.
5.1.2 Within-subjects effect of seeing one visualization vs. the other on probability of superiority
This considers both estimates that people made, comparing their estimated probability of superiority for the SD visualization to the same estimate for the SE visualization. This is broken down by which medication type they were shown and which visualization they saw first.
5.1.2.1 Plots
Regardless of which visualization people saw first, they estimated the probability of superiority to be higher when they saw SEs compared to SDs.
plot_data_turkers <- df.turkers %>%
mutate(diff_superiority_se_vs_sd = superiority_estimate_se - superiority_estimate_sd) %>%
group_by(medication_type) %>%
summarize(mu = mean(diff_superiority_se_vs_sd),
se = sd(diff_superiority_se_vs_sd) / sqrt(n())) %>%
mutate(y = 1:n())
kable(plot_data_turkers)
| medication_type | mu | se | y |
|---|---|---|---|
| Blood pressure scenario | 8.533864 | 0.9622555 | 1 |
| COVID-19 scenario | 7.433735 | 0.9255478 | 2 |
df.turkers %>%
mutate(diff_superiority_se_vs_sd = superiority_estimate_se - superiority_estimate_sd) %>%
ggplot(aes(x=medication_type, y=diff_superiority_se_vs_sd)) +
geom_point(mapping=aes(color=medication_type), position = position_jitter(width=0.1)) +
# scale_color_manual(values=c("#af8dc3", "#7fbf7b")) +
stat_summary(fun = mean, geom = "point", size=1.2) +
stat_summary(fun.data="mean_se", fun.args = list(mult=1),
geom="errorbar", color = "black", width=0.1) +
theme(legend.position = "none") +
geom_hline(yintercept = 0, linetype = 'dashed') +
xlab("Scenario") +
ylab("Drop in estimated probability of superiority\nwhen seeing SDs vs. SEs")
ggsave(here("analysis/figures/turkers_within_subjects_drop_in_psup.pdf"), width=4, height=4)
within_subjects_diff_statistics <- df.turkers %>%
mutate(diff_superiority_se_vs_sd = superiority_estimate_se - superiority_estimate_sd) %>%
group_by(medication_type) %>%
summarize(
mu=mean(diff_superiority_se_vs_sd),
se=sd(diff_superiority_se_vs_sd)/sqrt(n())
)
blood_within_mu <- (within_subjects_diff_statistics %>% filter(medication_type == "Blood pressure scenario"))$mu
blood_within_se <- (within_subjects_diff_statistics %>% filter(medication_type == "Blood pressure scenario"))$se
covid_within_mu <- (within_subjects_diff_statistics %>% filter(medication_type == "COVID-19 scenario"))$mu
covid_within_se <- (within_subjects_diff_statistics %>% filter(medication_type == "COVID-19 scenario"))$se
ggplot(plot_data_turkers, aes(x = medication_type, y = mu)) +
geom_pointrange(aes(ymin = mu - se, ymax = mu + se)) +
geom_hline(yintercept = 0, linetype = 'dashed') +
#facet_wrap(~ medication_type, scale = "free_x") +
#scale_y_continuous(lim = c(0, 10)) +
labs(x = '',
y = 'Drop in estimated probability of superiority\nwhen seeing SDs vs. SEs',
title = 'Average drop in probability of superiority estimate when seeing SDs vs. SEs',
subtitle = 'Bars show one standard error') +
coord_flip()
scale <- 0.7
df.turkers %>%
mutate(diff_superiority_se_vs_sd = superiority_estimate_se - superiority_estimate_sd) %>%
ggplot(aes(x = diff_superiority_se_vs_sd, y = medication_type)) +
geom_density_ridges(stat = "binline", binwidth = 1, scale = scale) +
geom_errorbarh(data = plot_data_turkers, aes(x = mu, xmin = mu - se, xmax = mu + se, y = y + scale), height = 0.1) +
geom_point(data = plot_data_turkers, aes(x = mu, y = y + scale)) +
coord_cartesian(ylim = c(1.25, 2.25)) +
labs(x = 'Drop in estimated probability of superiority\nwhen seeing SDs vs. SEs',
y = 'First visualization seen',
title = 'Distribution of drop in probability of superiority estimate\nwhen seeing SDs vs. SEs') +
#facet_wrap(~ medication_type, ncol = 1) +
theme(legend.position = "none")
## Warning: Ignoring unknown aesthetics: x
We can also look at the full set of responses. For each person we plot their two estimates (one for when they saw SDs and one for when they saw SEs) connected by a line. The slope of the line indicates how much they changed their estimates, with a positive slope indicating that the estimate when seeing SEs was higher than when seeing SDs. Negatively-sloped lines are colored blue, and as hypothesized, the majority of people revised in this direction (somewhere between 60 and 70 percent of people, depending on the condition).
plot_data <- df.turkers %>%
select(worker_id, medication_type, first_condition,
superiority_estimate_sd_lower_than_se,
superiority_estimate_sd_equal_to_se,
superiority_estimate_sd_higher_than_se,
superiority_estimate_sd, superiority_estimate_se) %>%
gather("variable", "value", superiority_estimate_sd, superiority_estimate_se) %>%
mutate(variable = case_when(variable == 'superiority_estimate_sd' ~ 'When seeing SDs',
variable == 'superiority_estimate_se' ~ 'When seeing SEs'),
variable = factor(variable, levels = c('When seeing SEs', 'When seeing SDs')))
plot_data %>%
group_by(medication_type, first_condition) %>%
summarize(frac_sd_lower_than_se = mean(superiority_estimate_sd_lower_than_se),
frac_sd_equal_to_se = mean(superiority_estimate_sd_equal_to_se),
frac_sd_higher_than_se = mean(superiority_estimate_sd_higher_than_se)) %>%
kable()
## `summarise()` has grouped output by 'medication_type'. You can override using the `.groups` argument.
| medication_type | first_condition | frac_sd_lower_than_se | frac_sd_equal_to_se | frac_sd_higher_than_se |
|---|---|---|---|---|
| Blood pressure scenario | Saw SEs first | 0.5970149 | 0.2164179 | 0.1865672 |
| Blood pressure scenario | Saw SDs first | 0.6153846 | 0.1965812 | 0.1880342 |
| COVID-19 scenario | Saw SEs first | 0.5284553 | 0.2520325 | 0.2195122 |
| COVID-19 scenario | Saw SDs first | 0.5793651 | 0.2222222 | 0.1984127 |
ggplot(plot_data, aes(x = variable, y = value, group = worker_id, color = superiority_estimate_sd_lower_than_se)) +
geom_point() +
geom_line(alpha = 0.25) +
facet_grid(medication_type ~ first_condition) +
labs(x = '',
y = 'Estimated probability of superiority',
title = 'Change in probability of superiority estimates when seeing SEs vs. SDs') +
theme(legend.position = "none")
5.1.2.2 Significance tests
The second set of tests of our primary hypothesis is a within-subjects test using two separate, independent-groups t-tests. Specifically, within each of the two medical scenarios, we compute a within-participant change score as participants’ probability of superiority estimate in the inferential uncertainty condition minus their estimate in the outcome uncertainty condition. We test whether there is statistical significance for these change scores being greater than zero.
df.turkers %>%
mutate(diff_superiority_se_vs_sd = superiority_estimate_se - superiority_estimate_sd) %>%
group_by(medication_type) %>%
do(tidy(t.test(x = .$diff_superiority_se_vs_sd,
mu = 0,
conf.level = 0.95,
type = "two.sample",
alternative = c("greater")))) %>%
kable()
| medication_type | estimate | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
|---|---|---|---|---|---|---|---|---|
| Blood pressure scenario | 8.533864 | 8.868606 | 0 | 250 | 6.945208 | Inf | One Sample t-test | greater |
| COVID-19 scenario | 7.433735 | 8.031714 | 0 | 248 | 5.905636 | Inf | One Sample t-test | greater |
We see a statistically significant difference: in both medical scenarios, we see that’s people’s estimates for the probability of superiority were higher, on average, when they made the estimate after seeing SEs compared to making the estimate after seeing SDs, leading to a positive change score. This supports our primary hypothesis.
An alternative is to just look at the fraction of people who had a lower estimate when they saw SDs compared to when they saw SEs, implemented as a sign test. This ignores the magnitude of changes and just considers direction.
df.turkers %>%
mutate(diff_superiority_se_vs_sd = superiority_estimate_se - superiority_estimate_sd) %>%
group_by(medication_type) %>%
do(tidy(DescTools::SignTest(.$diff_superiority_se_vs_sd, mu = 0)))
## # A tibble: 2 × 9
## # Groups: medication_type [2]
## medication_type estimate statistic p.value parameter conf.low.lwr.ci
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Blood pressure scenario 5 152 4.16e-14 199 5
## 2 COVID-19 scenario 5 138 3.50e-10 190 0
## # … with 3 more variables: conf.high.upr.ci <dbl>, method <chr>,
## # alternative <chr>
Again we see a statistically significant difference for both scenarios.